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Abstract 



PURPOSE:To reduce retrieval leakage by performing retrieval by enlarging a retrieval character string : 
? essentially so as to compensate the incompletion of character recognition when a character string is 

retrieved from a character code obtained by performing character recognition on a document, 
j CONSTITUTIONS information processing means 2 enlarges the range of the character string to be 
l retrieved, and retrieves the character string that coincides with either the character strings whose 
' range are enlarged as reading out character code information within a range instructed by s storage 

means 3. When the character string of document information inputted as image information is 
■.. retrieved from the character code information to which the character recognition is applied in such 
'{ way, the range can be enlarged essentially and the retrieval is performed by substituting the sum set 
"'. of the character string and all the character strings with possibility to recognize the character string 

erroneously for the character string by referring to a correspondence table that is a table representing j 
■ the tendency of erroneous recognition proper to the algorithm of character recognition. In such a way, j 

it is possible to remarkably reduce the frequency of the retrieval leakage even when the erroneous 

recognition is performed by a character recognition means 5. 
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3 p. A-*- 



SPECIFICATION 



1 . 



Title of the Invention 



Information Retrieval Method and Information 



5 



Storage Apparatus 



2. Scope of Claims 

(1 ) An i nf ormation retrievalmethodforani nf ormation 
storage apparatus wherein when a character string matching 
a predetermined condi tion equation i s retrieved from stored 
10 character code information obtained through recognition 
of characters input as image information, a retrieval range 
is broadened in order to compensate for incompleteness 
inherent in a character recognition algorithm. 



15 image storage means for storing document information input 
as image information; recognition means for recognizing 
characters contained in the document information; auxi liary 
information storage means for storing code information 
output from said recognition means; and retrieval means 

20 for retrieving the code information in accordance with 
a condition equation input as a retrieval range, wherein 
said retrieval means retrieves the code information by 
broadening a substantial retrieval range so that 
incompleteness of said recogniti on means can be compensated. 

25 (3) An information storage apparatus according to 

claim 2, wherein said retrieval means has a table storing 
a correspondence between each character and another 
character or characters easy to be erroneously recognized 



(2) An information storage apparatus comprising: 
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from the first mentioned character, the table being formed 
in accordance with erroneous recognition tendency inherent 
in a recognition algorithm to be used by said recognition 
means, and said retrieval means retrieves a character string 
5 contained in the retrieval condition equation as a sum-set 
of each character of the character string and the other 
character or characters which are easy to be erroneously 
recogni zed from the first mentioned character and are related 
by the correspondence table. 
10 3. Detailed Description of the Invention 
Industrial Application Field 

The present invention relates to an information storage 
apparatus for electronically storing a document input as 
image information and to an information retrieval method. 

15 Related Art 

An information storage apparatus called a document 
filing apparatus for electronically storing documents and 
drawings input as image informationinprevai ling inbusiness 
departments which mainly manage documents and drawings. 

20 With reference to the a c company i ngdrawings, an example 

of a conventional information storage apparatus will be 
descr i bed. 

Fig. 5 is a flow chart illustrating the operation 
of a conventional information storage apparatus. Fig. 5(a) 
25 illustrates a document registration operation, and Fig. 
5(b) illustrates a document retrieval operation. 

The operation of the information storage apparatus 
operating as illustrated in Figs. 5(a) and 5(b) will be 
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detailed hereinunder. 

When a document is registered, document information 
is captured as image information by using an image input 
apparatus such as an image scanner. The captured image 
5 information is stored in a storage device such as an optical 
disc. Next, auxi liary i nformation to be used for retrieval, 
such as the document name, classification, author and 
keywords respectively of the stored image information, 
is input from a keyboard- The auxiliary information and 

10 information representative of the image information are 
stored in the storage device at a predetermined location. 

When the image information stored in the storage device 
is retrieved, a retrieval condition for identifying the 
auxi liary information is input from the keyboard to retrieve 

15 the auxiliary information stored in the storage device 
and matching the retrieval condition. After the auxiliary 
information of the document information to be retrieved 
is retrieved, the corresponding document can be read (for 
example, refer to "Office Automation Guide" Ohm Co. pp. 

20 111-113). 

Problems to be Solved by the Invention 
With the above operation, however, it is essential 
to enter the auxiliary information for retrieval when a 
document is registered. It takes labor to register a 

25 document. Furthermore, if a plurality of persons register 
and retrieve documents, it is necessary to have integrity 
and consistency of keywords used by those persons. 
Management of a keyword system becomes complicated. 
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Under the circumstances of such problems, theinvention 
provides an information retrieva L method and an information 
storage apparatus capable of retrieving a document without 
entering auxiliary information for retrieval when the 
5 document is registered- 
Means for Solving the Problems 

In order to solve the above problems, according to 
the retrieval method and information storage apparatus 
of the invention, a desired character string is retrieved 

10 from character code information obtained through 

recognition of characters of a document input as image 
information, wherein when the character string to be 
retrieved is recognized from the image information of the 
document, a character stringmatching either a target correct 

15 character string or another character string or strings 
having a possibility of being erroneously recognized from 
the target correct character string because of 
incompleteness of a recognition algorithm. 
Operation 

20 According to the information retrieval method of the 

invention, a desired character string is recognized from 
character code information obtained through character 
recognition of a document input as image information. 
Accordingly, a character string can be retrieved directly 

25 from the document information without adding retrieval 
auxiliary information such as keywords to the document 
information. Since a retrieval condition is broadened in 
order to compensate for incompleteness of character 
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recognition, it is possible to avoid a retrieval omission 
by erroneous recognition. 

The principle of a method of compensating for 
incompleteness of recognition will be described. 
5 Fig. 3 is a conceptual diagram illustrating ideal 

character recognition. In Fig. 3, areas a to h surrounded 
by solid lines indicate existence areas of patterns of 
virtual characters a to h, respectively. Areas A to H 
surrounded by broken lines indicate areas of patterns 

10 recognized as the characters a to h, respectively. Since 
the areas a to h are completely included in the areas A 
to H, respectively, it is obvious that the characters a 
to h are all recognized correctly. 

Fig. 4 is a conceptual diagram illustrating the case . 

15 that characters are not recognized correctly. Similar to 
Fig. 3, areas i to p surrounded by solid lines indicate 
existence areas of patterns of virtual characters i to 
p, respectively. Areas I to P surrounded by broken lines 
indicate areas of patterns recognized as the characters 

20 i to p, respectively. No character is recognized from areas 
X to Z. In this example shown in Fig. 4, all characters 
i to p are not completely included in the areas I to P 
so that perfect character recognition is impossible. For 
example, although the character i is recognized as the 

25 character i in some cases, if the character i is written 
by patterns similar to character j or n, the character 
i is erroneously recognized as the character j or n. The 
patterns of characters m and 0 are overlapped so that 
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erroneous recognition is inevitable unless means other 
than the pattern recognition, such as recognition of meaning 
from the context of a passage, is used together with the 
pattern recognition. Such a case occurs if different 
5 character systems are used. Typical examples are a kanji 
character "X" and a Greek character and a numeral 

"0" and an alphabet "O". 

There is a tendency inherent in a recognition algorithm 
that such erroneous recognition occurs at which character 

10 in what manner. If this tendency can be grasped, such 

deficiency can be compensated during retrieval. Consider 
for example that the character i is erroneously recognized 
as the character j. In this case, although it is not suitable 
for printing or displaying, a search omission can be avoided 

15 if characters (i + j + n) are searched at the same time. 
If the search range is broadened, unnecessary characters 
are retrieved. However, if the retrieval condition is 
narrowed down there is no practical problem. In practice, 
retrieval using one character hardly occurs and retrieval 

20 using a compound word made of a combination of several 
characters is usually performed. Therefore, the retrieval 
range is not substantially broadened too much. Consider 
for example that a character string "Xij" is retrieved. 
In this case, if the character "X" is broadened to "X 

25 + A " and the character " jj" is broadened to "jj + the 
character string is broadened to "X^3 + A:£/ + Aft" - However, 
in this case, " A £T , "Ai", "A.*" and the like hardly exist 
so that a substantial extension of the retrieval range 
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is very small. 

As above, by compensating for incompleteness of 
character recognition during retrieval, search omissions 
can be reduced considerably. 



An information retrieval method according to an 
embodiment of the inventionwill bedescribedwith reference 
to the accompanying drawings. 

Fig. 1 is a block diagram showing an information storage 

10 apparatus according to a first embodiment of the invention. 
In Fig. 1, an image input unit 1 captures images of a 
hand-written or printed document. An information 
processing unit 2 performs information input/output control 
and various processes. A storage device 3 stores, when 

15 necessary, information to be used by the information 

processing unit 2. A code input unit 4 inputs auxiliary 
information of the image information input from the image 
input unit 1, a character string to be retrieved, and the 
Like. A character recognition unit 5 cuts characters off 

20 the image information supplied from the image processing 
unit 2, recognizes them, and returns the obtained character 
codes back to the information processing unit 2. An output 
unit 6outputs specific information designated by the inputs 
of the input unit 4, information retrieved and extracted 

25 in accordance with the inputs of the input unit 4, and 
other information. The operation of the information 
storage apparatus constructed as above will be described 
with reference to Figs. 1 and 2. 



5 



Embodiment 
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Fig. 2 is a flow chart illustrating an operation of 
the information storage apparatus of the embodiment. Fig. 
2(a) illustrates a document registration operation, and 
Fig. 2(b) illustrates a document retrieval operation. 

.5 When a hand-written or printed document is registered, 
the document is read with the image input unit 1 such as 
an image scanner as image information. The image 
information is transferred to the information processing 
uni t 2 . The i nf or ma t i on processing uni t 2 changes the format 

10 of the image information and transfers the image information 
to thestoragedevice3 tobestoredas a fi le. When necessary, 
auxiliary information such as a keyword is entered from 
the code input unit A. The information processing unit 

2 changes the format of the auxiliary information and 
15 transfers the auxiliary information to the storage device 

3 to be stored at a predetermined location. The information 
processing unit 2 also transfers the image information 
to the character recognition unit 5 which sequentially 
cuts characters from the image information, recognizes 

20 them and converts them into character codes. The character 
recognition unit 5 returns the recognized and obtained 
character codes to the information processing unit 2. 
The information processing unit 2 changes the character 
code information to have a predetermined format, and sends 

25 the character code information to the storage device 3 
to be stored at a predetermined location. 

In retrieving desired information from the document 
information registered in the above manner, first, a 




- 9 - 



Limitation condition is input from the code input unit 
4 in order to Limit a retrieva L target fiLe. The Limitation 
condition may be written by a keyword. This Limitation 
condition is not entered if aLL fiLes are used as retrieval 
5 target fiLes. Next, a character string to be retrieved 
is input from the code input unit 4. This character string 
is transferred to the information processing unit 2. In 
accordance with a correspondence tabLe in the information 
processing unit 2, the informationprocessingunit 2 broadens 
10 the range of the character string. A process of broadening 
the range of a character string wiLL be described more 
i n detai L . 

The correspondence tabLe is a tabLe storing a 
correspondence between each character and a corresponding 

15 character or characters having a possibility of being 

erroneously recognized as the first mentioned character 
by the character recognition unit 5. Assuming that the 
areas of patterns of the characters i to p and the areas 
I to P of patterns recognized as the characters i to p, 

20 are distributed as shown in Fig. 4, the correspondence 
tabLe for these characters is shown in TabLe 1. 



TabLe 1 



] Character i 


Character i. Character j , Character n 


[ 


Character j 


Character j, Character n, Character ? 


Character k 


Character k 




Character L 


Character L 


Character m 


Character m 

- • - — - - - - i 


j Character n 


Character n _,_| 





• # 
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Character o 
Character p 



This correspondence table is really written by a character 
5 code of each character. The character ? means a special 
code to be assigned to a character not recognized. 

If a character string to be retrieved is "character 
j + character m + character o" of three characters, the 
character j is replaced with a sum-set of characters j, 
10 n and ? and the character o is replaced with a sum-set 
of characters o and m. 

Therefore, the retrieval is performed by us inga sum-set 
of six character strings: 

"characters j, m and o" 
15 "characters j, m and m" 

"characters n, m and o" 

"characters n, m and m" 

"characters ?, m and o" 

"characters ?, m and m" 

20 

In the above manner, the information processing unit 
2 broadens the range of the character string to be retrieved. 
The information processing unit 2 reads the character code 
information in the designated range from the storage device 
25 3d to retrieve the character string matching any one of 
the character strings in the broadened range. The 
information processing unit 2 transfers the retrieved and 
extracted document information to the output unit 6. If 



Character o. Character 
Character p 
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the output unit 6 is a CRT, the document information is 
displayed thereon, or if the output unit 6 is a printer, 
the document information is printed out. 

As above, according to this embodiment, in retrieving 
5 a character string from character code information obtained 
through character recogni tion of document information input 
as image information, the target character string is replaced 
with a sum-set of the target character string and all 
character strings having a possibility of being err oneously 

10 recognized as the target character string, by referring 
to the correspondence table indicating a tendency of 
erroneous recognition inherent in a character recognition 
algorithm. In this manner, since the retrieve is performed 
by substantially broadening the retrieval range, even if 

15 the character recognition unit 5 erroneously recognizes 
a character, the occurrence frequency of retrieval omissions 
can be reduced considerably. 

In the above embodiment, the retrieval target character 
string is replaced with a sum-set of the target character 

20 string and all other character strings likely to be 

erroneously recognized, by referring to the correspondence 
table. According to the main aspect of the invention, if 
character codes obtained by character recognition are used 
by retrieval only, erroneous recognition does not pose 

25 a serious problem on the assumption that recognition 

incompleteness is compensated. From this point of view, 
the range of the character string to be retrieved is 
substantially broadened to compensate for the recognition 
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incompleteness. Means for substantially broadening the 
range of a character string to be retrieved is therefore 
not limited to a particular means. 
Effects of the Invention 
5 As described so far, according to the invention, in 

retrieving a character string from character codes obtained 
through character recognition of a document, the range 
of the character string is substantially broadened so as 
to compensate for the character recognition incompleteness, 
10 so that an occurrence frequency of retrieval omissions 
can be reduced considerably. 
4. Brief Description of the Drawings 

Fig. 1 is a block diagram of an information storage 
apparatus according to an embodiment of the invention, 
15 Fig. 2 is a flow chart illustrating the operation of the 
information storage apparatus of the embodiment. Fig. 3 
is a conceptual diagram illustrating ideal character 
recognition. Fig. A is a conceptual diagram illustrating 
incomplete character recognition, and Fig. 5 is a flow 
20 chart illustrating the operation of a conventional 
information storage apparatus. 

2... information processing uni t, 3. . . storage device, 
4... code input unit, 5... character recognition unit. 

Name of Agent: Attorney Akira KOKAJI and two others 
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FIG. 1 

1... IMAGE INPUT UNIT, 2... INFORMATION PROCESSING UNIT, 
3... STORAGE DEVICE, 4... CODE INPUT UNIT, 5... CHARACTER 
RECOGNITION UNIT, 6... OUTPUT UNIT 

5 

FIG. 2 

(1 ) . . . START, (2) . . . READ DOCUMENT WITH IMAGE SCANNER, (3) . . . 
STORE DOCUMENT IN STORAGE DEVICE AS IMAGE INFORMATION, 

(4) . . . ENTER RETRIEVAL KEYWORD FROM KEYBOARD, (5) . . . STORE 
10 KEYWORD IN STORAGE DEVICE, C6) . . . RECOGNIZE CHARACTERS 

FROM IMAGE INFORMATION AND STORE CHARACTER CODES IN STORAGE 
DEVICE AT PREDETERMINED LOCATION AS AUXILIARY INFORMATION, 
C7)... REGISTRATION COMPLETED ?, C8)... ENTER RETRIEVAL 
RANGE FROM KEYBOARD, (9) . . . ENTER RETRIEVAL CONDITION FROM 

15 KEYBOARD, (10)... BROADEN CHARACTER STRING IN CONDITION 
EQUATION BY USING CORRESPONDENCE TABLE, (11)... RETRIEVE 
BY BROADENED RETRIEVAL CONDITION AND AUXILIARY INFORMATION 
IN RETRIEVAL RANGE, (12)... NARROW DOWN RETRIEVE RANGE ?, 
(13)... SET RETRIEVAL RESULT TO RETRIEVAL RANGE, (14)... 

20 OUTPUT RETRIEVE RESULT, (15)... END 

FIG. 5 

(1 ) . . . START, (2) . . . READ DOCUMENT WITH IMAGE SCANNER, (3) . . . 
STORE DOCUMENT IN STORAGE DEVICE AS IMAGE INFORMATION, 
25 (4) . . . ENTER RETRIEVAL AUXILIARY INFORMATION FROM KEYBOARD, 

(5) ... STORE AUXILIARY INFORMATION IN STORAGE DEVICE AT 
PREDETERMINED LOCATION, (6)... REGISTRATION COMPLETED ?, 
(7)... ENTER RETRIEVAL RANGE FROM KEYBOARD, (8)... ENTER 
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RETRIEVAL CONDITION FROM KEYBOARD, (9)... RETRIEVE BY 
AUXILIARY INFORMATION IN RETRIEVAL RANGE, (10)... NARROW 
DOWN RETRIEVE RANGE ?, (11)... SET RETRIEVAL RESULT TO 
RETRIEVAL RANGE, (12)... OUTPUT RETRIEVE RESULT, (13)... 
5 END 
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